NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

4DIFF: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation

Cheng, Feng; Luo, Mi; Wang, Huiyu; Dimakis, Alex; Torresani, Lorenzo; Bertasius, Gedas; Grauman, Kristen (May 2025, ECCV '24 https://openreview.net/forum?id=nReyoIseTD)

Abstract. We present 4Diff, a 3D-aware diffusion model addressing the exo-to-ego viewpoint translation task—generating first-person (egocentric) view images from the corresponding third-person (exocentric) images. Building on the diffusion model’s ability to generate photorealistic images, we propose a transformer-based diffusion model that incorporates geometry priors through two mechanisms: (i) egocentric point cloud rasterization and (ii) 3D-aware rotary cross-attention. Egocentric point cloud rasterization converts the input exocentric image into an egocentric layout, which is subsequently used by a diffusion image transformer. As a component of the diffusion transformer’s denoiser block, the 3D-aware rotary cross-attention further incorporates 3D information and semantic features from the source exocentric view. Our 4Diff achieves state-of-the-art results on the challenging and diverse Ego-Exo4D multiview dataset and exhibits robust generalization to novel environments not encountered during training. Our code, processed data, and pretrained models are publicly available at https://klauscc.github.io/4diff.
more » « less
Free, publicly-accessible full text available May 19, 2026
ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning

Yang, Yue; Ikeda, Bryce; Bertasius, Gedas; Szafir, Daniel (October 2024, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))

Robot Imitation Learning (IL) is a crucial technique in robot learning, where agents learn by mimicking human demonstrations. However, IL encounters scalability challenges stemming from both non-user-friendly demonstration collection methods and the extensive time required to amass a sufficient number of demonstrations for effective training. In response, we introduce the Augmented Reality for Collection and generAtion of DEmonstrations (ARCADE) framework, designed to scale up demonstration collection for robot manipulation tasks. Our framework combines two key capabilities: 1) it leverages AR to make demonstration collection as simple as users performing daily tasks using their hands, and 2) it enables the automatic generation of additional synthetic demonstrations from a single human-derived demonstration, significantly reducing user effort and time. We assess ARCADE's performance on a real Fetch robot across three robotics tasks: 3-Waypoints-Reach, Push, and Pick-And-Place. Using our framework, we were able to rapidly train a policy using vanilla Behavioral Cloning (BC), a classic IL algorithm, which excelled across these three tasks. We also deploy ARCADE on a real household task, Pouring-Water, achieving an 80% success rate.
more » « less
Full Text Available
LoCoNet: Long-Short Context Network for Active Speaker Detection

https://doi.org/10.1109/cvpr52733.2024.01747

Wang, Xizi; Cheng, Feng; Bertasius, Gedas (June 2024, IEEE)

Full Text Available
Learning to Retrieve Videos by Asking Questions

https://doi.org/10.1145/3503161.3548361

Madasu, Avinash; Oliva, Junier; Bertasius, Gedas (October 2022, Proceedings of the 30th ACM International Conference on Multimedia)

Full Text Available
VindLU: A Recipe for Effective Video-and-Language Pretraining

https://doi.org/10.1109/CVPR52729.2023.01034

Cheng, Feng; Wang, Xizi; Lei, Jie; Crandall, David; Bansal, Mohit; Bertasius, Gedas (June 2023, IEEE)

Full Text Available
Vision Transformers are Parameter-Efficient Audio-Visual Learners

https://doi.org/10.1109/CVPR52729.2023.00228

Lin, Yan-Bo; Sung, Yi-Lin; Lei, Jie; Bansal, Mohit; Bertasius, Gedas (June 2023, IEEE)

Full Text Available

Search for: All records